Summary

For this assignment, I did remake to automate data analysis pipelines. - This is the (remake.yml)[] file which have all the targets and dependencies. And I explained every step in the file with comments. - This is the (code.R)[] file which contains all the functions/rules that will relate the targets together. And I explained every step in the file with comments. - This is how it works: in remake.yml, we will call the functions defined in code.R to generate all the targets and automate the pipeline. Refer to those two files to see more detailed explanation.

Processed Data

This is what the processed gapminder data looks like.

knitr::kable(head(read.csv("summary_dat.csv")))
X country continent year lifeExp pop gdpPercap newpop weight weighted_mean_gdp
1 Afghanistan Asia 1952 28.801 8425333 779.4453 0.8425333 0.0001670 0.1301948
2 Afghanistan Asia 1957 30.332 9240934 820.8530 0.9240934 0.0001832 0.1503842
3 Afghanistan Asia 1962 31.997 10267083 853.1007 1.0267083 0.0002035 0.1736474
4 Afghanistan Asia 1967 34.020 11537966 836.1971 1.1537966 0.0002287 0.1912753
5 Afghanistan Asia 1972 36.088 13079460 739.9811 1.3079460 0.0002593 0.1918807
6 Afghanistan Asia 1977 38.438 14880372 786.1134 1.4880372 0.0002950 0.2319102

Figures

Fig 1: The trend of mean gdp weighted by population over time for different continent

Fig 1: The trend of mean gdp weighted by population over time for different continent

Fig 2: The scatterplot of mean gdp weighted by population vs lifeExp for diffrent continent

Fig 2: The scatterplot of mean gdp weighted by population vs lifeExp for diffrent continent

Dependency Diagram

remake::diagram() will generate the dependency diagram based on the remake.yml file, which is super convenient. From this diagram viewing backwards, we can see that our final product report.html depends on the two figures and report.md. And the two figures depend on processed_gapminder_data, which depends on gapminder_df, which depends on gapminder.tsv. In other words, we first download gapminder.tsv using the function downl_tsv(). Then we read it in and call it gapminder_df. Then we mutate gapminder_df to have weighted_mean_gdp using the function process_data()and call the mutated dataframe processed_gapminder_data. Then we creat two plots based on the processed_gapminder_data using plot_gdp_year() and plot_gdp_lifeExp() functions, which together with report.md will produce the final report.html.

remake::diagram(remake_file = "remake.yml")